Skip to content

Add project clone retries functionality#1613

Open
dkwon17 wants to merge 3 commits intodevfile:mainfrom
dkwon17:retries
Open

Add project clone retries functionality#1613
dkwon17 wants to merge 3 commits intodevfile:mainfrom
dkwon17:retries

Conversation

@dkwon17
Copy link
Copy Markdown
Collaborator

@dkwon17 dkwon17 commented Apr 10, 2026

Assisted-by: Claude Code Opus 4.6

What does this PR do?

Adds project clone retries functionality.

For the time being, by default, there will be up to 3 retries (4 attempts in total) with an exponential backoff. In a later PR, it would be nice to add a field in the DWOC to configure the number of retry attempts.

What issues does this PR fix or reference?

https://redhat.atlassian.net/browse/CRW-9779

Is it tested? How?

After building the image, install DWO:

export PROJECT_CLONE_IMG=quay.io/dkwon17/project-clone:retries
podman build -t "$PROJECT_CLONE_IMG" -f ./project-clone/Dockerfile .
podman push $PROJECT_CLONE_IMG
export DOCKER=podman
make docker
make install 

I have tested two cases:

Verify that regular cloning is working as intended

Create the following DW and verify that project-clone was successful:

curl -sL https://raw.githubusercontent.com/devfile/devworkspace-operator/refs/heads/main/samples/git-clone-sample.yaml | oc apply -f -

Successful project-clone init container:

...
2026/04/09 23:09:41 Processing project devworkspace-operator
2026/04/09 23:09:41 Cloning project devworkspace-operator to /projects/project-clone-2875820073/devworkspace-operator
Cloning into '/projects/project-clone-2875820073/devworkspace-operator'...
2026/04/09 23:09:49 Cloned project devworkspace-operator to /projects/project-clone-2875820073/devworkspace-operator
...
Verify that retries are working as intended.

Create the following DW. Note that in this DW, the devworkspace-operator git project has typos in the remote urls, which will cause failures during git clone.

kind: DevWorkspace
apiVersion: workspace.devfile.io/v1alpha2
metadata:
  name: git-clone-sample-devworkspace
spec:
  started: true
  template:
    projects:
      - name: web-nodejs-sample
        git:
          remotes:
            origin: "https://github.com/che-samples/web-nodejs-sample.git"
      - name: devworkspace-operator
        git:
          checkoutFrom:
            remote: origin
            revision: 0.21.x
          remotes:
            origin: "https://github.com/typo/devworkspace-operator.git"
            amisevsk: "https://github.com/typo/devworkspace-operator.git"
    commands:
      - id: say-hello
        exec:
          component: che-code-runtime-description
          commandLine: echo "Hello from $(pwd)"
          workingDir: ${PROJECT_SOURCE}/app
  contributions:
    - name: che-code
      uri: https://eclipse-che.github.io/che-plugin-registry/main/v3/plugins/che-incubator/che-code/latest/devfile.yaml
      components:
        - name: che-code-runtime-description
          container:
            env:
              - name: CODE_HOST
                value: 0.0.0.0

Verify in the project-clone init container logs, that retries have been logged:

2026/04/09 23:59:47 Failed git clone for project devworkspace-operator (attempt 1/4): failed to git clone from https://github.com/ddevfile/devworkspace-operator.git: exit status 128
2026/04/09 23:59:47 Retrying git clone for project devworkspace-operator (attempt 2/4) after 1s
2026/04/09 23:59:48 Cloning project devworkspace-operator to /projects/project-clone-2688951296/devworkspace-operator
Cloning into '/projects/project-clone-2688951296/devworkspace-operator'...
Error: passphrase file is missing in the '/etc/ssh/' directory
error: unable to read askpass response from '/.ssh-askpass/ssh-askpass.sh'
fatal: could not read Username for 'https://github.com': No such device or address
2026/04/09 23:59:48 Failed git clone for project devworkspace-operator (attempt 2/4): failed to git clone from https://github.com/ddevfile/devworkspace-operator.git: exit status 128
2026/04/09 23:59:48 Retrying git clone for project devworkspace-operator (attempt 3/4) after 2s
2026/04/09 23:59:50 Cloning project devworkspace-operator to /projects/project-clone-2688951296/devworkspace-operator
Cloning into '/projects/project-clone-2688951296/devworkspace-operator'...
Error: passphrase file is missing in the '/etc/ssh/' directory
error: unable to read askpass response from '/.ssh-askpass/ssh-askpass.sh'
fatal: could not read Username for 'https://github.com': No such device or address
2026/04/09 23:59:50 Failed git clone for project devworkspace-operator (attempt 3/4): failed to git clone from https://github.com/ddevfile/devworkspace-operator.git: exit status 128
2026/04/09 23:59:50 Retrying git clone for project devworkspace-operator (attempt 4/4) after 4s

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

Summary by CodeRabbit

  • New Features

    • Git clone now automatically retries failed attempts with exponential backoff between tries.
    • Number of retry attempts is configurable via PROJECT_CLONE_RETRIES (defaults applied, invalid values ignored, values clamped to a sensible maximum).
    • Previous failed clone data is cleaned up before each retry.
  • Bug Fixes

    • Improved retry and failure logging for clearer diagnostics and preserved underlying error details.

Signed-off-by: David Kwon <dakwon@redhat.com>
Assisted-by: Claude Code Opus 4.6
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dkwon17

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Adds retry logic with exponential backoff to the initial git clone flow, cleans up failed temp clones between attempts, preserves underlying errors on final failure, and exposes retry count via a PROJECT_CLONE_RETRIES environment variable with validation and clamping.

Changes

Cohort / File(s) Summary
Git clone retry flow
project-clone/internal/git/setup.go
Replaces a single CloneProject attempt with a retry loop up to CloneRetries; logs intermediate failures; removes previous temp clone dirs between attempts; adds delayBeforeRetry implementing exponential backoff; uses %w to wrap final error; adds time import; updates copyright year.
Retry configuration & init
project-clone/internal/global.go
Adds exported CloneRetries variable and PROJECT_CLONE_RETRIES parsing in init() with default (3), max clamp (10), validation and logging; adds strconv/time imports and updates copyright year.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Worker as Worker Process
participant Git as Git Server/Client
participant FS as Filesystem
participant Timer as Backoff Timer
Worker->>Git: attempt CloneProject(project, tmpClonePath)
alt clone succeeds
Git-->>Worker: success
else clone fails
Git-->>Worker: error
Worker->>FS: os.RemoveAll(tmpClonePath) (cleanup)
Worker->>Timer: delayBeforeRetry(wait)
Timer-->>Worker: sleep complete
Worker->>Git: retry CloneProject (repeat up to CloneRetries)
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • ibuziuk

Poem

🐰 I dug a hole, then tried to clone,
When Git said "not today," I waited alone.
Backoff in hops, I cleaned the mess,
Tried once more — success, no stress.
A little hop, repo home, confessed. 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding retry functionality to project cloning with exponential backoff.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
project-clone/internal/global.go (1)

68-76: Consider adding an upper bound for CloneRetries.

The validation rejects negative values but allows arbitrarily large retry counts. With exponential backoff, high values like CloneRetries=30 would result in a delay of ~17 years before the final retry attempt (1s * 2^29).

Consider capping at a reasonable maximum (e.g., 10) to prevent configuration mistakes from causing excessively long waits.

Suggested change
+const maxCloneRetries = 10
+
 CloneRetries = defaultCloneRetries
 if val := os.Getenv(cloneRetriesEnvVar); val != "" {
     parsed, err := strconv.Atoi(val)
-    if err != nil || parsed < 0 {
-        log.Printf("Invalid value for %s: %q, using default (%d)", cloneRetriesEnvVar, val, defaultCloneRetries)
+    if err != nil || parsed < 0 || parsed > maxCloneRetries {
+        log.Printf("Invalid value for %s: %q (must be 0-%d), using default (%d)", cloneRetriesEnvVar, val, maxCloneRetries, defaultCloneRetries)
     } else {
         CloneRetries = parsed
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@project-clone/internal/global.go` around lines 68 - 76, The env parsing for
CloneRetries (using cloneRetriesEnvVar and defaultCloneRetries) only rejects
negatives but allows huge values; add an upper bound (e.g., maxCloneRetries =
10) and when parsed > maxCloneRetries log a warning and set CloneRetries =
maxCloneRetries, otherwise set CloneRetries = parsed; keep the existing
invalid-number fallback behavior and use the same log.Printf pattern to report
out-of-range values.
project-clone/internal/git/setup.go (1)

66-68: Use %w for proper error wrapping.

Using %s formats the error as a string but loses the error chain, preventing callers from using errors.Is() or errors.Unwrap() to inspect the underlying cause.

Suggested change
 if cloneErr != nil {
-    return fmt.Errorf("failed to clone project: %s", cloneErr)
+    return fmt.Errorf("failed to clone project: %w", cloneErr)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@project-clone/internal/git/setup.go` around lines 66 - 68, The error return
currently formats cloneErr with "%s" which discards the error chain; update the
return in the clone error path to wrap the original error using "%w" so callers
can use errors.Is/errors.Unwrap (i.e., change the return that references
cloneErr to use "%w" to wrap the error).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@project-clone/internal/git/setup.go`:
- Around line 66-68: The error return currently formats cloneErr with "%s" which
discards the error chain; update the return in the clone error path to wrap the
original error using "%w" so callers can use errors.Is/errors.Unwrap (i.e.,
change the return that references cloneErr to use "%w" to wrap the error).

In `@project-clone/internal/global.go`:
- Around line 68-76: The env parsing for CloneRetries (using cloneRetriesEnvVar
and defaultCloneRetries) only rejects negatives but allows huge values; add an
upper bound (e.g., maxCloneRetries = 10) and when parsed > maxCloneRetries log a
warning and set CloneRetries = maxCloneRetries, otherwise set CloneRetries =
parsed; keep the existing invalid-number fallback behavior and use the same
log.Printf pattern to report out-of-range values.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62363f36-541d-466e-bc69-a145673dc401

📥 Commits

Reviewing files that changed from the base of the PR and between fcee0cd and f8e5d42.

📒 Files selected for processing (2)
  • project-clone/internal/git/setup.go
  • project-clone/internal/global.go

Signed-off-by: dkwon17 <dakwon@redhat.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@project-clone/internal/git/setup.go`:
- Around line 66-68: The clone error return currently exits without removing the
temporary clone directory (tmpClonePath), leaving stale data; ensure
tmpClonePath is cleaned on failure by calling os.RemoveAll(tmpClonePath) before
returning (or by registering a defer cleanup immediately after creating
tmpClonePath) in the function that performs the clone (refer to the tmpClonePath
variable and the clone error handling block that returns fmt.Errorf("failed to
clone project: %w", cloneErr)).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a72c18e9-e8fb-47ea-9cd8-f1a9f4fc692e

📥 Commits

Reviewing files that changed from the base of the PR and between f8e5d42 and d312f58.

📒 Files selected for processing (2)
  • project-clone/internal/git/setup.go
  • project-clone/internal/global.go

Comment on lines +66 to 68
if cloneErr != nil {
return fmt.Errorf("failed to clone project: %w", cloneErr)
}
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Final clone-failure path skips temp directory cleanup.

After the last failed attempt, Line 67 returns immediately and can leave partial repo data in tmpClonePath. Over repeated init-container failures, stale temp directories may accumulate under the mounted projects volume.

Proposed fix
 	if cloneErr != nil {
+		if err := os.RemoveAll(tmpClonePath); err != nil {
+			log.Printf("Warning: cleanup after final clone failure failed: %s", err)
+		}
 		return fmt.Errorf("failed to clone project: %w", cloneErr)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if cloneErr != nil {
return fmt.Errorf("failed to clone project: %w", cloneErr)
}
if cloneErr != nil {
if err := os.RemoveAll(tmpClonePath); err != nil {
log.Printf("Warning: cleanup after final clone failure failed: %s", err)
}
return fmt.Errorf("failed to clone project: %w", cloneErr)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@project-clone/internal/git/setup.go` around lines 66 - 68, The clone error
return currently exits without removing the temporary clone directory
(tmpClonePath), leaving stale data; ensure tmpClonePath is cleaned on failure by
calling os.RemoveAll(tmpClonePath) before returning (or by registering a defer
cleanup immediately after creating tmpClonePath) in the function that performs
the clone (refer to the tmpClonePath variable and the clone error handling block
that returns fmt.Errorf("failed to clone project: %w", cloneErr)).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if project-clone failed to clone, the workspace will still start. Having partial data in the tmp folder be useful for debugging.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkwon17, that's a good point — keeping the partial clone data in the temp directory after failure aids debugging. I'll withdraw this suggestion.


✏️ Learnings added
Learnt from: dkwon17
Repo: devfile/devworkspace-operator PR: 1613
File: project-clone/internal/git/setup.go:66-68
Timestamp: 2026-04-10T19:50:55.397Z
Learning: In `project-clone/internal/git/setup.go`, leaving the partial/stale clone data in `tmpClonePath` after a final clone failure is intentional: the workspace still starts even when project-clone fails, and the leftover data in the temp directory is useful for post-failure debugging. Do not flag the absence of `os.RemoveAll(tmpClonePath)` cleanup on the final error return path as an issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: CR
Repo: devfile/devworkspace-operator PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-27T15:35:18.698Z
Learning: Applies to test/e2e/**/*_test.go : In e2e tests creating DevWorkspace, MUST use 'DeleteDevWorkspaceAndWait' in cleanup (AfterAll or AfterEach) to delete workspace and wait for PVC cleanup. Using 'DeleteDevWorkspace' alone may leave PVC and cause conflicts.

Learnt from: CR
Repo: devfile/devworkspace-operator PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-27T15:35:18.698Z
Learning: Applies to controllers/workspace/**/*.go : When workspace fails, call 'failWorkspace()' with appropriate 'FailureReason' (ReasonBadRequest, ReasonInfrastructureFailure, ReasonWorkspaceEngineFailure, or ReasonUnknown)

Learnt from: CR
Repo: devfile/devworkspace-operator PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-27T15:35:18.698Z
Learning: Workspace bootstrapping is supported via 'controller.devfile.io/bootstrap-devworkspace: true' attribute when a devfile cannot be resolved but the project can be cloned. Flow: workspace starts with generic deployment, project-clone clones projects, finds devfile, DevWorkspace is updated with found devfile, workspace restarts with new definition.

Signed-off-by: dkwon17 <dakwon@redhat.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
project-clone/internal/global.go (2)

38-40: Use guideline-compliant constant naming for new retry constants.

cloneRetriesEnvVar, defaultCloneRetries, and maxCloneRetries are lower camelCase. Rename them to PascalCase or UPPER_SNAKE_CASE to align with the repo’s Go constant naming rule.

As per coding guidelines: "Use PascalCase or UPPER_SNAKE_CASE for constant names."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@project-clone/internal/global.go` around lines 38 - 40, The constants
cloneRetriesEnvVar, defaultCloneRetries, and maxCloneRetries use lowerCamelCase;
rename them to follow the project's constant naming rule (PascalCase or
UPPER_SNAKE_CASE) throughout the file and all usages—e.g.,
ProjectCloneRetriesEnvVar or CLONE_RETRIES_ENV_VAR, DefaultCloneRetries or
DEFAULT_CLONE_RETRIES, and MaxCloneRetries or MAX_CLONE_RETRIES—update their
declarations and all references (search for cloneRetriesEnvVar,
defaultCloneRetries, maxCloneRetries) to the new names to ensure compilation and
consistency.

18-31: Reorder import groups to match repository Go conventions.

The import block currently places a project-local package before third-party imports. Please order as: standard library, third-party/Kubernetes, then project-local.

As per coding guidelines: "Organize imports into three groups separated by blank lines: (1) standard library, (2) third-party + Kubernetes, (3) project-local."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@project-clone/internal/global.go` around lines 18 - 31, The import block is
misordered: move imports into three groups (standard library,
third-party/Kubernetes, then project-local) separated by blank lines;
specifically keep "errors", "fmt", "log", "os", "regexp", "strconv", "time"
together as the first group, then place go-git imports ("github.com/go-git/...",
"github.com/kevinburke/ssh_config") and any other third-party/Kubernetes
packages as the second group, and place the project-local package
"github.com/devfile/devworkspace-operator/pkg/library/constants" as the final
group; update the import ordering in the file's import block accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@project-clone/internal/global.go`:
- Around line 38-40: The constants cloneRetriesEnvVar, defaultCloneRetries, and
maxCloneRetries use lowerCamelCase; rename them to follow the project's constant
naming rule (PascalCase or UPPER_SNAKE_CASE) throughout the file and all
usages—e.g., ProjectCloneRetriesEnvVar or CLONE_RETRIES_ENV_VAR,
DefaultCloneRetries or DEFAULT_CLONE_RETRIES, and MaxCloneRetries or
MAX_CLONE_RETRIES—update their declarations and all references (search for
cloneRetriesEnvVar, defaultCloneRetries, maxCloneRetries) to the new names to
ensure compilation and consistency.
- Around line 18-31: The import block is misordered: move imports into three
groups (standard library, third-party/Kubernetes, then project-local) separated
by blank lines; specifically keep "errors", "fmt", "log", "os", "regexp",
"strconv", "time" together as the first group, then place go-git imports
("github.com/go-git/...", "github.com/kevinburke/ssh_config") and any other
third-party/Kubernetes packages as the second group, and place the project-local
package "github.com/devfile/devworkspace-operator/pkg/library/constants" as the
final group; update the import ordering in the file's import block accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 65e38a1c-3820-48a8-bb76-6861f3d85b6d

📥 Commits

Reviewing files that changed from the base of the PR and between d312f58 and f20a64c.

📒 Files selected for processing (1)
  • project-clone/internal/global.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant